Bernardo Di Chiara, Data Analyst
http://fi.linkedin.com/in/bernardodichiara
Last plotted day: see the end of the file
1. Executive Summary
....1.1. References
2. Setup
3. Defining the Needed Functions
....3.1. Dataframes and lists handling
....3.2. Plots
....3.3. Project-specific functions
4. Dumping and Collecting the Data
5. Data Analysis
....5.1. Summary
....5.2. Preliminary Data Analysis
....5.3. Data Cleansing
....5.4. Data Preparation
............5.4.1. New datasets with no NaN, no GPS coordinates / list of days / list of Countries
............5.4.2. Population age data
............5.4.3. World Data
............5.4.4. Finnish Data
............5.4.5. Data from other Scandinavian Countries
............5.4.6. Data from other European Countries
............5.4.7. Data from UK
............5.4.8. Data from US
............5.4.9. Data from China
....5.5. Summary of the created datasets
6. Domain-Specific Concepts
7. Data Visualization
....7.1. Overview
............7.1.1. General Comments to the Plots
............7.1.2. A Reference Curve Set
....7.2. Finnish Internal Situation
....7.3. Comparison with Other Scandinavian Countries
....7.4. Comparison with Italy and other European Countries
....7.5. Comparison with UK and US
....7.6. Normalizing by Country population
....7.7. Normalizing by Country population and population density
....7.8. Situation in China
....7.9. Situation in Italy
....7.10. World View
............7.10.1. Lethality
8. Statistics
....8.1. World view
....8.2. Top Ten Countries
....8.3. Finland
9. Conclusions
10. Acknowledgements
This notebook contains visualizations related to the spread of the Coronavirus COVID-19 in Finland.
The data is taken from the Johns Hokpins University (JHU) /1/.
There are a few good dashboards in the Web about to this topic (for example, by Johns Hokpins University /2/ and by Tableau /3/). In addition, there is a good site with latest information about Finland broken down by Region /4/. Another very useful source of information is the European Centre for Disease Prevention and Control /5/. Still, it might be beneficial to manipulate the data in order, for example, to compare Finnish curves with curves from other Countries.
Having updated charts is very useful both for authorities and for the population in order to make fact-based decisions that help to contain the positive cases so not to overload the hospitals and therefore minimizing the casualties.
Comparing Finnish curves to those of neighboring Countries might provide useful insights since, in addition to the geographical proximity and similar weather, those Countries have certain similarities in culture, behavior patterns and may be genetics.
A line plot containing confirmed cases each day as well as recovered and deceased cases in Finland has been produced. The active cases have been shown in the same plot.
A second line plot containing the new confirmed daily cases in Finland, which shows the speed at which the virus is spreading, has been added.
Finnish curves have been compared to the curves of the other Scandinavian Countries as well as few other European Countries as well as UK and US.
Plots showing the number of confirmed cases per capita have been created to eliminate the population variable from the comparisons.
Finally, plots with worldwide data have been produced.
Bar plots containing data of the most affected Countries have been added.
Due to the criticality of this information, no recommendations are included in this paper. Currently, Doctors and Authorities are the best sources for such recommendations.
If you are not interested in the code, go to section 6 and onward and focus on the plots, the tables and the plain text.
DISCLAIMER:
The spread of virus follows the rules of mathematics and statistics (Dr. Katharina Hauck, https://www.imperial.ac.uk/people/k.hauck).
/1/ [GitHub Repository by Johns Hokpins University](https://github.com/CSSEGISandData/COVID-19)
https://github.com/CSSEGISandData/COVID-19
/2/ [Dashboard by Johns Hokpins University with world-wide view](https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6)
https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6
/3/ Dashboard by Tableau with both global and Country-specific data
https://public.tableau.com/profile/covid.19.data.resource.hub#!/vizhome/COVID-19Cases_15840488375320/COVID-19Cases
/4/ Latest news about Finland broken by Region
https://finland-coronavirus-map.netlify.com/
/5/ European Centre for Disease Prevention and Control
https://www.ecdc.europa.eu/en/novel-coronavirus-china
/6/ Coursera: Let's Talk About COVID-19
https://www.coursera.org/learn/covid-19/home/welcome
# Importing the needed packages
import os
import datetime as dt
import regex as re
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Displaying all the dafaframe columns
pd.set_option('display.max_columns', None)
def df_basic_data(dfname):
'''
This function prints basic information about a given dataframe.
The function needs as input parameters the dataframe name.
'''
import pandas as pd
# Fetching the dataframe name
name = [x for x in globals() if globals()[x] is dfname][0]
print("Dataframe name:", name, "\n")
print("Dataframe length:", len(dfname), "\n")
print("Number of columns:", len(dfname.columns), "\n")
# Columns data types
data_types = dfname.dtypes
# Distint values
distint_values = dfname.apply(pd.Series.nunique)
# Amount of null values
null_values = dfname.isnull().sum()
print("Dataframe's columns names, column data types, amount of distint "
"(non null) values\n"
"and amount of null values for each column:")
df_index = ['Data_Type',
'Amount_of_Distint_Values',
'Amount_of_Null_Values']
col_types_dist_null = pd.DataFrame([data_types,
distint_values,
null_values],
index=df_index)
return col_types_dist_null.transpose()
def calc_increments(listname):
'''
This function:
takes a list,
calculates the delta between each element and its predecessor,
returns the result in a new list having the same lenght as the original list
'''
# Initializing an empty list of floats to contain the increments
increments = []
# Adding zero to the first element
increments.append(0.0)
# Looping through all the occurrencies except the first one
for i in list(range(1, len(listname))):
# Calculating the increment
delta = listname[i]-listname[i-1]
# Adding the result to the list
increments.append(delta)
# Returning the result
return increments
def cust_line_plot(*parameters,
figsize_w=8, figsize_h=6,
title=None,
title_fs=16, title_offset=20,
rem_borders=False,
label_fs=12, tick_fs=6,
x_label=None,
rot=0,
y_label=None,
legend=False, leg_fs=10, legend_loc=0,
first_line_x=None, first_line_x_l=None,
second_line_x=None, second_line_x_l=None,
third_line_x=None, third_line_x_l=None,
fourth_line_x=None, fourth_line_x_l=None,
special_line_x=None, special_line_x_l=None):
"""
This function plots a scatterplot for the provided data
and customizes the way the chart looks by using the value of
the provided parameters.
Keyword arguments:
parameters -- A (mandatory) tuple of 5 elements containing:
a list with the x values,
a list with the y values,
a string containing the selected marker,
a string containing the selected line style,
an integer (from 0 to 9) selecting the seaborn-deep
color,
a string containing the text for the legend
figsize_w -- The width of the plot area
figsize_w -- The height of the plot area
title -- A string containing the title of the chart
title_fs -- The title font size
title_offset -- Distance between the title and the top of the chart
rem_borders -- If True the top and right borders are removed
(default: False)
label_fs -- x and y axis labels' font size
tick_fs -- The tick values font size
x_label -- Label for the x-axis (string)
rot -- The rotation angle of the tick values
y_label -- Label for the y-axis (string)
legend -- A boolean variable that tells if to plot a legend
leg_fs -- Font size for the legend
legend_loc -- An integer from 0 to 9 controlling the legend location
first_line_x
second_line_x
third_line_x
fourth_line_x -- x coordinates of vertical lines
first_line_x_l
second_line_x_l
third_line_x_l
fourth_line_x_l -- legend text for the corresponding lines
"""
import matplotlib.pyplot as plt
import seaborn as sns
# Creating a new figure
plt.figure(figsize=(figsize_w, figsize_h))
# Defining the used style
color_list = sns.color_palette(palette='deep')
# Adding a title (with some distance to the top of the plot)
plt.title(title, fontsize=title_fs, pad=title_offset)
# Removing the top and right borders if so defined
if rem_borders is True:
sns.despine(top=True, right=True, left=False, bottom=False)
# Initializing an empy list to contain the legend text
leg_text_l = []
for param in parameters:
# Extracting the values given in parameters
x = param[0]
y = param[1]
mark = param[2]
ls = param[3]
col_numb = param[4]
leg_text = param[5]
# Appending the string to the list
leg_text_l.append(leg_text)
# Creating the scatter plots
plot = plt.plot(x, y, marker=mark, linestyle=ls, color=color_list[col_numb])
# If a label for the x axis is provided, showing it on the x axis
if x_label:
plt.xlabel(x_label, fontsize=label_fs)
plt.xticks(fontsize=tick_fs, rotation=rot)
# If a label for the y axis is provided, showing it on the y axis
if y_label:
plt.ylabel(y_label, fontsize=label_fs)
plt.yticks(fontsize=tick_fs)
# Adding vertical lines
if first_line_x:
plt.axvline(x=first_line_x, color='grey', linestyle=':')
leg_text_l.append(first_line_x_l)
if second_line_x:
plt.axvline(x=second_line_x, color='grey', linestyle='--')
leg_text_l.append(second_line_x_l)
if third_line_x:
plt.axvline(x=third_line_x, color='grey', linestyle='-.')
leg_text_l.append(third_line_x_l)
if fourth_line_x:
plt.axvline(x=fourth_line_x, color='grey', linestyle='-')
leg_text_l.append(fourth_line_x_l)
if special_line_x:
plt.axvline(x=special_line_x, color=color_list[6], linestyle='--')
leg_text_l.append(special_line_x_l)
# Adding a legend
if legend:
plt.legend(labels=leg_text_l, fontsize=leg_fs, loc=legend_loc,
facecolor="white", framealpha=1)
# Showing the plot without additional text
plt.show()
def cust_bar_plot(parameters,
figsize_w=8, figsize_h=6,
title=None, title_fs=16, title_offset=20,
rem_borders=False,
label_fs=12, tick_fs=6,
x_label=None,
rot=0,
y_label=None,
legend=False,
leg_fs=10,
legend_loc=0,
first_line_x=None, first_line_x_l=None,
second_line_x=None, second_line_x_l=None,
third_line_x=None, third_line_x_l=None,
fourth_line_x=None, fourth_line_x_l=None,
special_line_x=None, special_line_x_l=None,
first_line_y=None, first_line_y_l=None,
second_line_y=None, second_line_y_l=None,
third_line_y=None, third_line_y_l=None):
"""
This function plots a bar plot for the provided data
and customizes the way the chart looks by using the value of
the provided parameters.
Keyword arguments:
parameters -- A (mandatory) tuple of 4 elements containing:
a list with the x values,
a list with the y values,
an integer (from 0 to 9) selecting the seaborn-deep
color,
a string containing the text for the legend
figsize_w -- The width of the plot area
figsize_w -- The height of the plot area
title -- A string containing the title of the chart
title_fs -- The title font size
title_offset -- Distance between the title and the top of the chart
rem_borders -- If True the top and right borders are removed
(default: False)
label_fs -- x and y axis labels' font size
tick_fs -- The tick values font size
x_label -- Label for the x-axis (string)
rot -- The rotation angle of the tick values
y_label -- Label for the y-axis (string)
legend -- A boolean variable that tells if to plot a legend
leg_fs -- Font size for the legend
legend_loc -- An integer from 0 to 9 controlling the legend location
first_line_x
second_line_x
third_line_x
fourth_line_x -- x coordinates of vertical lines
first_line_y
second_line_y
third_line_y -- y coordinates of horizontal lines
first_line_x_l
second_line_x_l
third_line_x_l
fourth_line_x_l
first_line_y_l
second_line_y_l
third_line_y_l -- legend text for the corresponding lines
"""
import matplotlib.pyplot as plt
import seaborn as sns
# Creating a new figure
plt.figure(figsize=(figsize_w, figsize_h))
# Defining the used style
color_list = sns.color_palette(palette='deep')
# Adding a title (with some distance to the top of the plot)
plt.title(title, fontsize=title_fs, pad=title_offset)
# Removing the top and right borders if so defined
if rem_borders is True:
sns.despine(top=True, right=True, left=False, bottom=False)
# Initializing an empy list to contain the legend text
leg_text_l = []
# Extracting the values given in parameters
x = parameters[0]
y = parameters[1]
col_numb = parameters[2]
leg_text = parameters[3]
# Creating the bar plot
plot = plt.bar(x, y, color=color_list[col_numb])
# If a label for the x axis is provided, showing it on the x axis
if x_label:
plt.xlabel(x_label, fontsize=label_fs)
plt.xticks(fontsize=tick_fs, rotation=rot)
# If a label for the y axis is provided, showing it on the y axis
if y_label:
plt.ylabel(y_label, fontsize=label_fs)
plt.yticks(fontsize=tick_fs)
# Adding vertical lines
if first_line_x:
plt.axvline(x=first_line_x, color='grey', linestyle=':')
leg_text_l.append(first_line_x_l)
if second_line_x:
plt.axvline(x=second_line_x, color='grey', linestyle='--')
leg_text_l.append(second_line_x_l)
if third_line_x:
plt.axvline(x=third_line_x, color='grey', linestyle='-.')
leg_text_l.append(third_line_x_l)
if fourth_line_x:
plt.axvline(x=fourth_line_x, color='grey', linestyle='-')
leg_text_l.append(fourth_line_x_l)
if special_line_x:
plt.axvline(x=special_line_x, color=color_list[6], linestyle='--')
leg_text_l.append(special_line_x_l)
# Adding horizontal lines
if first_line_y:
plt.axhline(y=first_line_y, color='grey', linestyle=':')
leg_text_l.append(first_line_y_l)
if second_line_y:
plt.axhline(y=second_line_y, color='grey', linestyle='--')
leg_text_l.append(second_line_y_l)
if third_line_y:
plt.axhline(y=third_line_y, color='grey', linestyle='-.')
leg_text_l.append(third_line_y_l)
# Adding a legend
if legend:
leg_text_l.append(leg_text)
plt.legend(labels=leg_text_l, fontsize=leg_fs, loc=legend_loc,
facecolor="white", framealpha=1)
# Showing the plot without additional text
plt.show()
def plot_stacked_bar(x, data, series_labels, col,
multidim=True, figsize_w=8, figsize_h=6,
title=None, title_fs=16,
frame=True,
category_labels=None,
label_fs=12, ticks_fs=12,
x_label=None, rot=0,
y_label=None,
legend=True, legend_loc=0, legend_fs=10,
add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10):
"""
This function plots a stacked bar chart with the provided data and
labels.
Keyword arguments:
x -- A list containing the x values (mandatory)
data -- A list of lists where each internal list contains
data of a series (mandatory)
series_labels -- List of series labels (strings) (these appear in
the legend) (mandatory)
col -- A list of integers controlling the colors of the series
(mandatory)
multidim -- Defines if data is multidimensional (default is True)
figsize_w -- The width of the plot area
figsize_w -- The height of the plot area
title -- A string containing the title of the chart
title_fs -- The title font size
frame -- If False, the figure frame is omitted as well as
ticks and labels on the y axis
category_labels -- List of category labels (strings) (these appear
on the x-axis)
label_fs -- x and y axis labels' font size
tick_fs -- The tick values font size
rot -- The rotation of the x axisis label (numerical)
(the default is horizontal)
y_label -- Label for the y-axis (string)
legend -- If true it shows a legend
legend_loc -- Used to position the legend compared to the centre
of the plot
legend_fs -- Legend font size
add_text -- Additional text to be shown in a box (string)
addtext_x -- Used to position the additional text box
addtext_y -- Used to position the additional text box
addtext_fs -- Font size of the additional text
"""
# Finding the number of categories
if multidim:
cat_number = len(data[0])
else:
cat_number = len(data)
# Preparing the indexes for the x axis
ind = list(range(cat_number))
# Initializing a list
axes = []
# Defining a numpy array containing the y coordinates of the bars
# (the bars of the first series are on the x axis)
bar_base = np.zeros(cat_number)
# Converting the list with the data into a numpy array
data = np.array(data)
# Creating a new figure
plt.figure(figsize=(figsize_w, figsize_h))
# Defining the used style
color_list = sns.color_palette(palette='deep')
# Adding a title (with some distance to the top of the plot)
plt.title(title, fontsize=title_fs, pad=20)
# Removing the frame and y axis ticks and values if so defined
if frame is False:
sns.despine(top=True, right=True, left=False, bottom=False)
# If category labes are provided, showing them on the x axis
if category_labels:
plt.xticks(ind, category_labels, fontsize=ticks_fs, rotation=rot)
# If a label for the x axis is provided, showing it on the x axis
if x_label:
plt.xlabel(x_label, fontsize=label_fs)
# If a label for the y axis is provided, showing it on the y axis
if y_label:
plt.ylabel(y_label, fontsize=label_fs)
if multidim:
# Iterating through the dimensions of the array
for i, row_data in enumerate(data):
# Creating the bars
axes.append(plt.bar(x, row_data, bottom=bar_base,
color=color_list[col[i]],
label=series_labels[i]))
# Incrementing the bar base height for the next series
# by the height of the bar of the previous series
bar_base += row_data
else:
# Creating the bars
axes.append(plt.bar(x, data))
# Creating a legend
if legend:
plt.legend(fontsize=legend_fs, loc=legend_loc,
facecolor="white", framealpha=1)
# Adding a text box with additional information
if add_text:
box_style = dict(facecolor='white')
plt.gcf().text(addtext_x, addtext_y,
add_text,
fontsize=addtext_fs, bbox=box_style)
# Showing the plot without additional text
plt.show()
def plot_cust_hbar(data,
figsize_w=8, figsize_h=6,
frame=True, grid=False,
ref_font_size=12,
title_text=None,
title_offset=20,
color_numb=0,
categ_labels=True,
labels=None,
rot=0,
show_values=False,
omitted_value=0,
percent=False,
center_al=True,
visible_digits=2):
"""
This function plots a horizontal bar charts for the provided data with
the provided labels and settings.
Keyword arguments:
data -- A sorted Series that contains categorical data
(mandatory)
figsize_w -- The width of the plot area
figsize_h -- The height of the plot area
frame -- If False, the figure frame is omitted as well as
ticks and labels on the y axis (default is True)
grid -- If True a horizontal grid is displayed. It works
only when frame=True (default is False)
ref_font_size -- Reference font size used for all the fonts
title_text -- A string containing the title of the chart
title_offset -- The offset of the title from the rest of the plot
color_numb -- An integer between 0 and 9 that indicated the
seaborn-deep color to be used for the bars
categ_labels -- A boolean variable that defines if category labels
shall appear (on the y-axis)
labels -- List of category labels (strings) used only if
categ_labels=True.
They override the existing labels
rot -- The rotation of the x axsis label (numerical)
(the default is horizontal)
show_values -- If True, then numeric value labels will be shown on
each bar (default is False)
omitted_value -- The max value that shall not be shown in the bar
percent -- If true, it indicates that the values are in percentage
(default is False)
center_al -- A boolean variable that defines if the values shall be
written in the centre of the bar (default is True)
visible_digits -- Integer defining the number of decimal digits
to be seen in the value labels (the default is 2)
"""
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Defining the suffix to be shown in the bar values
if percent:
p = '%'
else:
p = ""
# Preparing the indexes for the x axis
ind = list(range(len(data)))
# Creating a new figure
fig = plt.figure(figsize=(figsize_w, figsize_h))
# Defining the used style
color_list = sns.color_palette(palette='deep')
# Removing y axis ticks
plt.gca().yaxis.set_ticks_position('none')
if frame is False:
# Removing the borders, if so defined
sns.despine(top=True, right=True, left=True, bottom=True)
# Removing ticks and values in the x axes
plt.gca().axes.get_xaxis().set_visible(False)
elif grid:
# Showing a vertical grig, if so defined
plt.gca().xaxis.grid(color='grey', alpha=0.25,
linestyle='-', linewidth=1)
# Adding a title (with some distance to the top of the plot)
plt.title(title_text, fontsize=ref_font_size*1.33,
loc='center', pad=title_offset)
# Creating the bar plot
plot = plt.barh(ind, data, color=color_list[color_numb])
# Showing category labels on the y axes, if so defined
if categ_labels:
# Overriding the index value if category labels are provided
if labels:
plt.yticks(ind, labels, fontsize=ref_font_size, rotation=rot)
else:
plt.yticks(ind, data.index.tolist(),
fontsize=ref_font_size, rotation=rot)
else:
# Removing ticks and values in the y axes
plt.gca().axes.get_yaxis().set_visible(False)
# Showing the bar values, if so defined
if show_values:
# Iterating through the bars in the plot
for bar in plot:
# Getting bar height and width
w, h = bar.get_width(), bar.get_height()
# Printing the values only if they are bigger than the defined value
if w > omitted_value:
if center_al is True:
# Positioning the text in the centre of the bar horizontally
# and vertically
plt.text(bar.get_x() + w/2, bar.get_y() + h/2,
"{}".format(round(w, visible_digits))+p,
fontsize=ref_font_size, color="white",
ha="center", va="center")
else:
# Positioning the text at the right of the bar horizontally
# and in the centre vertically
plt.text(bar.get_x() + w, bar.get_y() + h/2,
"{}".format(round(w, visible_digits))+p,
fontsize=ref_font_size,
ha="left", va="center")
# Showing the plot without additional text
plt.show()
def find_last_day():
'''
This function reads in a certain directory to find the latest CSV file
and returns the date of the last file in a string in the format mm-dd-yyyy
'''
# Getting the list of files in the daily reports folder
for roots, dirs, files in os.walk('JHU_COVID-19/COVID-19/'
'csse_covid_19_data/'
'csse_covid_19_daily_reports'):
file_list = files # list of strings
# Initializing a new list
dates = []
# Iterating through the original list
for i in list(range(len(file_list))):
file = file_list[i]
# If is it a csv file ...
if re.search("\S+[csv]", file):
# Extracting the date into a list of string
date = re.findall("[0-9]+[-][0-9]+[-][0-9]+", file)
# Converting the format from string to date
dt_date = dt.datetime.strptime(date[0], "%m-%d-%Y")
# Appending the date to a list of dates (the new list)
dates.append(dt_date)
# Sorting the dates and taking the last one
dates.sort(reverse=True)
latest = dates[0] # datetime
# Converting the latest date to a string
last_day = latest.strftime("%m-%d-%Y")
return last_day
def extract_country(Country, State="Not applicable", days=0):
'''
This function allows selecting data related to a specific Country
from the datasets produced by JHU.
It takes the following input:
- a string containing the Country name written with the first letter
as a capital letter (mandatory)
- a string containing the State name written with the first letter
as a capital letter (default = "Not applicable")
- an integer containing how many days to skip (default = 0)
It returns a tuple of 2 lists containing data related to confirmed
and deceased cases.
'''
# Extracting confirmed cases
confirm = world_conf_clean[(world_conf_clean['Country/Region'] == Country) &
(world_conf_clean['Province/State'] == State)]
# Extracting the columns containing the data for each day
# by skipping a number of days equal to days
confirm = confirm.iloc[:, 4+days:]
# Copying the result into a list
confirm_l = confirm.values.tolist()[0]
# Extracting recovered cases
recov = world_recov_clean[(world_recov_clean['Country/Region'] == Country) &
(world_recov_clean['Province/State'] == State)]
# Extracting the columns containing the data for each day
# by skipping a number of days equal to days
recov = recov.iloc[:, 4+days:]
# Copying the result into a list
recov_l = recov.values.tolist()[0]
# Extracting deceased cases
deceas = world_deceas_clean[(world_deceas_clean['Country/Region'] ==
Country) &
(world_deceas_clean['Province/State'] == State)]
# Extracting the columns containing the data for each day
# by skipping a number of days equal to days
deceas = deceas.iloc[:, 4+days:]
# Copying the result into a list
deceas_l = deceas.values.tolist()[0]
return confirm_l, recov_l, deceas_l
def prep_country_data(Country, State="Not applicable", days=0):
'''
This function allows to prepare the data for a specific Country.
It takes the following inputs:
- a string variable that contains the name of the Country
written with the first letter as a capital letter (mandatory)
- a string containing the State name written with the first letter
as a capital letter (default = "Not applicable")
- an integer that tells the number of initial days in the time series
to skip (default = 0)
The function uses the following functions:
- 'extract_country' to extract Country-specific information from
the relevant dataframes
- 'calc_increments' to calculate the daily increments in a time series
- 'extract_non_null' to extract only the non null values of a time series
The output is a tuple with the following content:
- a list containing a time series with the cumulative confirmed cases
- a list containing a time series with the cumulative deceased cases
- a list containing a time series with the daily increment
in the confirmed cases
- a list containing a time series with the cumulative confirmed cases
starting from the day of the first positive case
'''
# Getting the name of thew Country in small letters
country = Country.lower()
'''
# Creating descriptive file names
countryname_hiddendays = "{}_{}". format(country, days)
countryname_conf_hiddend = "{}_conf_{}". format(country, days)
countryname_recov_hiddend = "{}_recov_{}". format(country, days)
countryname_deceas_hiddend = "{}_deceas_{}". format(country, days)
countryname_conf_incr_hiddend = "{}_conf_incr_{}". format(country, days)
countryname_conf_pos = "{}_conf_pos". format(country)
'''
# Extracting country-speficic data by using the function extract_country
countryname_hiddendays = extract_country(Country, State, days)
# Extracting the time series for the cumulative confirmed cases
countryname_conf_hiddend = countryname_hiddendays[0]
# Extracting the time series for the cumulative recovered cases
countryname_recov_hiddend = countryname_hiddendays[1]
# Extracting the time series for the cumulative deceased cases
countryname_deceas_hiddend = countryname_hiddendays[2]
# Extracting the time series for the daily increments in the confirmed cases
countryname_conf_incr_hiddend = calc_increments(countryname_conf_hiddend)
# Extracting the complete time series about the cumulative confirmed cases
complete_conf_series = extract_country(Country, State, 0)
# Extracting the time series for the cumulative confirmed cases
# starting from the day of the first positive case
countryname_conf_pos = extract_non_null(complete_conf_series[0])
return countryname_conf_hiddend, \
countryname_recov_hiddend, \
countryname_deceas_hiddend, \
countryname_conf_incr_hiddend, \
countryname_conf_pos
def extract_non_null(input_list):
'''
This function takes as input a list that contains a certain number of
zero values, omits such values and returns what is left in a new list.
'''
# Initializing a list
no_null = []
# Looping through all the elements of the list
for i in list(range(len(input_list))):
if input_list[i] != 0:
# Extracting non null values
no_null.append(input_list[i])
return no_null
def pop_perc(values, pop):
'''
This function takes the following inputs:
- a list of floats in units
- a float in million of units
The function calculates the percentage values of the values in the list
compared to the value in the single float miltiplied one million times.
The function is useful, for example, to calculate the number of
confirmed Coronavirus cases pro capite
(in percentage of the total pupulation in millions).
The function retunts a list of floats.
'''
result = (pd.Series(values)/(pop*1000000))*100
return result
The source csv files are located in the following directoryies:
JHU_COVID-19/COVID-19/csse_covid_19_data/csse_covid_19_time_seriesJHU_COVID-19/COVID-19/csse_covid_19_data/csse_covid_19_daily_reportsThose directory shall be located under the directory containing this notebook.
# Loading the data files into pandas dataframes
# Loading the world time series
world_confirmed = pd.read_csv('JHU_COVID-19/COVID-19/csse_covid_19_data/'
'csse_covid_19_time_series/'
'time_series_covid19_confirmed_global.csv')
world_recovered = pd.read_csv('JHU_COVID-19/COVID-19/csse_covid_19_data/'
'csse_covid_19_time_series/'
'time_series_covid19_recovered_global.csv')
world_deceased = pd.read_csv('JHU_COVID-19/COVID-19/csse_covid_19_data/'
'csse_covid_19_time_series/'
'time_series_covid19_deaths_global.csv')
# Uploading the latest daily report
last_day = find_last_day() # calling the function last_day
daily_report = pd.read_csv('JHU_COVID-19/COVID-19/csse_covid_19_data/'
'csse_covid_19_daily_reports/' + last_day + '.csv')
File descriptions
time_series_covid19_confirmed_global.csv: confirmed cases for each day for each Countrytime_series_covid19_recovered_global.csv: recovered cases for each day for each Countrytime_series_covid19_deaths_global.csv: confirmed cases for each day for each Countrymm-dd-yyyy.csv: last available daily report# Storing the total population for the Countries of interest (in millions)
# (source: Google)
italy_pop = 60.48
spain_pop = 46.66
germany_pop = 82.79
france_pop = 66.99
switzerland_pop = 8.57
netherlands_pop = 17.18
austria_pop = 8.822
belgium_pop = 11.4
portugal_pop = 10.29
luxembourg_pop = 0.602
denmark_pop = 5.603
norway_pop = 5.368
sweden_pop = 10.12
iceland_pop = 0.364
finland_pop = 5.513
uk_pop = 66.44
us_pop = 327.2
hubei_pop = 58.5
china_pop = 1386
restchina_pop = china_pop-hubei_pop
# Storing the population density for the Countries of interest (people/km2)
# (source: Google)
italy_dens = 201.3
spain_dens = 91.4
germany_dens = 240
france_dens = 122.34
switzerland_dens = 219
netherlands_dens = 488
austria_dens = 109
belgium_dens = 383
portugal_dens = 111
luxembourg_dens = 242
denmark_dens = 134
norway_dens = 15
sweden_dens = 25
iceland_dens = 3
finland_dens = 15
uk_dens = 274
us_dens = 36
hubei_dens = 310
china_dens = 145
# Storing the median age for the Countries of interest
# source: https://en.wikipedia.org/wiki/List_of_countries_by_median_age
italy_median_age = 45.5
spain_median_age = 42.7
france_median_age = 41.4
switzerland_median_age = 42.4
netherlands_median_age = 42.6
austria_median_age = 44.0
belgium_median_age = 41.4
portugal_median_age = 42.2
luxembourg_median_age = 39.3
denmark_median_age = 42.2
norway_median_age = 39.2
sweden_median_age = 41.2
iceland_median_age = 36.5
finland_median_age = 42.5
uk_median_age = 40.5
us_median_age = 38.1
china_median_age = 37.4
# List of containment actions taken by the Finnish Government
# Creating a dataframe
measures = pd.DataFrame(columns=['Date', 'Actions'])
# Adding the actions
measures = measures.append(pd.Series(["12.3.",
"First measures: gathering of more than 500 people banned"],
index=measures.columns), ignore_index=True)
measures = measures.append(pd.Series(["16.3.",
"State of emergency declared: closing shools, universities, museums, theatres, \
libraries, sport facilities; gathering of more than 10 people banned"],
index=measures.columns), ignore_index=True)
measures = measures.append(pd.Series(["28.3.",
"Additional measures: Uusimaa region borders closed, restaurant dining forbidden"],
index=measures.columns), ignore_index=True)
measures = measures.append(pd.Series(["11.4.",
"Additional measures: No passengers in ships from Germany, Sweden, Estonia"],
index=measures.columns), ignore_index=True)
Preliminary Data Analysis
The 3 time series files have columns for Province/State, Country/Region, latitude, longitude and data for each day. The columns related to the day are named in the format m/d/yy.
Each entry represents a different location. One Country can be associated with more than one State/Province and in this case one Country has more than one entry. This happens for US, China, Canada, France, Australia, United Kingdom, Netherlands and Denmark.
The daily report file has columns for Province/State, Country/Region, latitude, longitude and time stamp as well as cumulative confirmed, deaths and recovered cases.
Data Cleansing
NaN values have been handled by filling with the string "Not applicable".
Data Preparation
Separate datasets with no GPS coordinates and no time stamp have been created.
Separate datasets have been created to group data by Country.
A list of relevant dates for the plots has been created.
Country specific data has been extracted.
World-wide grand totals have been calculated.
A summary of the created datasets is available in section 5.5.
# Showing basic dataframe info
df_basic_data(world_confirmed)
# Showing basic dataframe info
df_basic_data(world_recovered)
# Showing basic dataframe info
df_basic_data(world_deceased)
# Showing basic dataframe info
df_basic_data(daily_report)
# Checking how data looks like
print("world_confirmed")
world_confirmed.head()
# Checking how data looks like
print("world_recovered")
world_recovered.head()
# Checking how data looks like
print("world_deceased")
world_deceased.head()
# Checking how data looks like
print("daily_report")
daily_report.head()
# Checking the Countries that are associated to more than one entry
print("Countries that are associated to more than one entry and number of entries\n")
print(daily_report['Country_Region'].value_counts().head(8).to_string())
# Checking the logic behind the classification
daily_report[daily_report['Country_Region'] == "Denmark"]
For France, United Kingdom, Netherlands and Denmark, in order to get the data related to the main land it is enough to search for Country_Region = countryname and Province_State = NaN.
This excludes from UK the Isle of Man and Channel Islands.
For Australia, it is enough to sum up all the entries where Country_Region = countryname. This includes Tasmania.
The same can be done for China and this will include also Hainan and Hong Kong.
For Canada, summing all the entries include also the people from Diamond Princess and Grand Princes ships, we well as Prince Edward Island population.
The procedure to follow for US shall still be determined. So far all the entries have been added up.
print("Population of different Countries in million (source: Google):\n\n",
"Italy:", italy_pop, "\n",
"Spain:", spain_pop, "\n",
"Germany:", germany_pop, "\n",
"France:", france_pop, "\n",
"Switzerland:", switzerland_pop, "\n",
"Netherlands:", netherlands_pop, "\n",
"Austria:", austria_pop, "\n",
"Belgium:", belgium_pop, "\n",
"Portugal:", portugal_pop, "\n",
"Luxembourg:", luxembourg_pop, "\n",
"Denmark:", denmark_pop, "\n",
"Norway:", norway_pop, "\n",
"Sweden:", sweden_pop, "\n",
"Iceland:", iceland_pop, "\n",
"Finland:", finland_pop, "\n",
"UK:", uk_pop, "\n")
print("NOTE: those figures are approximative.")
print("Median age of different Countries (source: Wikipedia):\n\n",
"Finland:", finland_median_age, "\n",
"Denmark:", denmark_median_age, "\n",
"Norwayd:", norway_median_age, "\n",
"Sweden:", sweden_median_age, "\n",
"Iceland:", iceland_median_age, "\n",
"Italy:", italy_median_age, "\n",
"Spain:", spain_median_age, "\n",
"France:", france_median_age, "\n",
"Switzerland:", switzerland_median_age, "\n",
"Netherlands:", netherlands_median_age, "\n",
"Austria:", austria_median_age, "\n",
"Belgium:", belgium_median_age, "\n",
"Portugal:", portugal_median_age, "\n",
"Luxembourg:", luxembourg_median_age, "\n")
print("NOTE: those figures are from year 2018.")
pd.options.display.max_colwidth = 150
print("Containment actions by the Finnish Government:\n")
measures.style.hide_index()
# Converting null values in strings with value "Not applicable"
world_conf_clean = world_confirmed.fillna("Not applicable")
world_recov_clean = world_recovered.fillna("Not applicable")
world_deceas_clean = world_deceased.fillna("Not applicable")
daily_rep_clean = daily_report.fillna("Not applicable")
# Dropping the GPS coordinates and storing the result in new datasets
world_conf_short = world_conf_clean.drop(['Lat', 'Long'], axis=1)
world_recov_short = world_recov_clean.drop(['Lat', 'Long'], axis=1)
world_deceas_short = world_deceas_clean.drop(['Lat', 'Long'], axis=1)
# Dropping the columns not related to the cases counters
daily_rep_short = daily_rep_clean.drop(['Lat',
'Long_',
'Last_Update',
'FIPS',
'Admin2',
'Combined_Key'],\
axis=1)
# Grouping by Province/State and storing the result in new datasets
world_conf_group = world_conf_short.groupby(['Country/Region']).sum()
world_recov_group = world_recov_short.groupby(['Country/Region']).sum()
world_deceas_group = world_deceas_short.groupby(['Country/Region']).sum()
daily_rep_group = daily_rep_short.groupby(['Country_Region']).sum()
# Creating a list of dates
# Extracting only the columns containing the virus cases data for each day
world_conf_data = world_confirmed.iloc[:,4:]
# Extracting the column values (dates) and putting them in a list
days_all = world_conf_data.columns.values.tolist()
# Initializing an empty list
days_tot = []
# Looping through the number of days
for i in list(range(len(days_all))):
# Extracting day and month and taking just the string value
new_element=re.findall("[0-9]+[/][0-9]+", days_all[i])[0]
# Adding the result to the list
days_tot.append(new_element)
print("List of days for the plots:\n")
days_tot
# Listing the Countries
print("List of Countries:\n")
world_conf_group.index.to_list()
# Creating a Pandas series containing median ages for different Countries
countries_median_age = pd.Series({'Finland': finland_median_age,
'Denmark': denmark_median_age,
'Norway': norway_median_age,
'Sweden': sweden_median_age,
'Iceland': iceland_median_age,
'Italy': italy_median_age,
'Spain': spain_median_age,
'France': france_median_age,
'Switzerland': switzerland_median_age,
'Austria': austria_median_age,
'Belgium': belgium_median_age,
'Portugal': portugal_median_age,
'Luxembourg': luxembourg_median_age,
'UK': uk_median_age})
# Calculating the minimum value
median_age_min = countries_median_age.min()
# Calculating the maximum value
median_age_max = countries_median_age.max()
# Calculating the median age range
median_age_range = median_age_max - median_age_min
print("The range of the median age in the Countries that are analyzed here is: "\
"{:.1f} years"\
.format(median_age_range))
# Selecting only the columns with the daily data
world_conf = world_conf_short.iloc[:,2:]
world_recov = world_recov_short.iloc[:,2:]
world_deceas = world_deceas_short.iloc[:,2:]
# Calculating cumulative worldwide data for each day
world_conf_tot = world_conf.sum()
world_recov_tot = world_recov.sum()
world_deceas_tot = world_deceas.sum()
# Calculating the active cases for each day
world_act_tot = list(np.array(world_conf_tot) - \
np.array(world_recov_tot) - \
np.array(world_deceas_tot))
# Calculating the daily increments in the confirmed cases
world_conf_incr = calc_increments(world_conf_tot)
# Calling the function extract_country to extract data related to Finland
# (skipping the first 6 days since they contain no confirmed cases)
Finland_6 = extract_country("Finland", "Not applicable", 6)
# Extracting the confirmed cases
finland_conf_6 = Finland_6[0]
# Extracting the recovered cases
finland_recov_6 = Finland_6[1]
# Extracting the decased cases
finland_deceas_6 = Finland_6[2]
# Creating a list of days to use for Finnish charts
# (skipping the first 6 days)
days_fin = days_tot[6:]
print("Compact Finnish data set:\n")
print("first day:", days_fin[0])
print("number of days:", len(days_fin))
# Visualizing the complete series
print("Confirmed cases time series:")
finland_conf_6
# Visualizing the complete series
print("Recovered cases time series:")
finland_recov_6
# Visualizing the complete series
print("Deceased cases time series:")
finland_deceas_6
# Calculating the active cases
finland_act_6 = list(np.array(finland_conf_6) - \
np.array(finland_recov_6) - \
np.array(finland_deceas_6))
finland_act_6
# Creating a list of same lenght as days_fin containing the increment of
# the confirmed cases compared to the previous day (first derivate)
# This tells how quickly the confirmed cases are growing
finland_conf_incr_6 = calc_increments(finland_conf_6)
# Visualizing the all series
print("Daily increment in confirmed cases time series:")
finland_conf_incr_6
# Extracting all data about Finland (from the first available day)
Finland_0 = extract_country("Finland", "Not applicable", 0)
# Extracting the confirmed cases
finland_conf_0 = Finland_0[0]
# Extracting the recovered cases
finland_recov_0 = Finland_0[1]
# Extracting the decased cases
finland_deceas_0 = Finland_0[2]
# Calculating the incremental values of the confirmed cases
finland_conf_incr_0 = calc_increments(finland_conf_0)
# Extracting the dataseries from the first confirmed case in the Country
# by using the function extract_non_null
# (the function extracts all non null values, not only the leading zeros
# but this is OK since the total confirmed cases cannot decrease)
finland_conf_pos = extract_non_null(finland_conf_0)
# Using the function pop_perc to calculate the confirmed cumulative cases
# in percentage of the total population
finland_conf_0_perc = pop_perc(finland_conf_0, finland_pop)
# Calling the function prep_country_data to extract data related
# to the other Scandinavian Countries
# 1. Skipping the first 6 days of the time series
# Denmark
denmark_6 = prep_country_data("Denmark", "Not applicable", 6)
denmark_conf_6 = denmark_6[0]
denmark_recov_6 = denmark_6[1]
denmark_deceas_6 = denmark_6[2]
denmark_conf_incr_6 = denmark_6[3]
# Norway
norway_6 = prep_country_data("Norway", "Not applicable", 6)
norway_conf_6 = norway_6[0]
norway_recov_6 = norway_6[1]
norway_deceas_6 = norway_6[2]
norway_conf_incr_6 = norway_6[3]
# Sweden
sweden_6 = prep_country_data("Sweden", "Not applicable", 6)
sweden_conf_6 = sweden_6[0]
sweden_recov_6 = sweden_6[1]
sweden_deceas_6 = sweden_6[2]
sweden_conf_incr_6 = sweden_6[3]
# Iceland
iceland_6 = prep_country_data("Iceland", "Not applicable", 6)
iceland_conf_6 = iceland_6[0]
iceland_recov_6 = iceland_6[1]
iceland_deceas_6 = iceland_6[2]
iceland_conf_incr_6 = iceland_6[3]
# 2. complete time series
# Denmark
denmark_0 = prep_country_data("Denmark", "Not applicable", 0)
denmark_conf_0 = denmark_0[0]
denmark_recov_0 = denmark_0[1]
denmark_deceas_0 = denmark_0[2]
denmark_conf_pos = denmark_0[4]
denmark_conf_0_perc = pop_perc(denmark_conf_0, denmark_pop)
# Norway
norway_0 = prep_country_data("Norway", "Not applicable", 0)
norway_conf_0 = norway_0[0]
norway_recov_0 = norway_0[1]
norway_deceas_0 = norway_0[2]
norway_conf_pos = norway_0[4]
norway_conf_0_perc = pop_perc(norway_conf_0, norway_pop)
# Sweden
sweden_0 = prep_country_data("Sweden", "Not applicable", 0)
sweden_conf_0 = sweden_0[0]
sweden_recov_0 = sweden_0[1]
sweden_deceas_0 = sweden_0[2]
sweden_conf_pos = sweden_0[4]
sweden_conf_0_perc = pop_perc(sweden_conf_0, sweden_pop)
# Iceland
iceland_0 = prep_country_data("Iceland", "Not applicable", 0)
iceland_conf_0 = iceland_0[0]
iceland_recov_0 = iceland_0[1]
iceland_deceas_0 = iceland_0[2]
iceland_conf_pos = iceland_0[4]
iceland_conf_0_perc = pop_perc(iceland_conf_0, iceland_pop)
# Calling the function prep_country_data to extract data related to Italy
italy_0 = prep_country_data("Italy", "Not applicable", 0)
italy_conf_0 = italy_0[0]
italy_recov_0 = italy_0[1]
italy_deceas_0 = italy_0[2]
italy_conf_incr_0 = italy_0[3]
italy_act_0 = list(np.array(italy_conf_0) - \
np.array(italy_recov_0) - \
np.array(italy_deceas_0))
italy_conf_pos = italy_0[4]
italy_conf_0_perc = pop_perc(italy_conf_0, italy_pop)
# Calling the function extract_country to extract data related to Spain
spain_0 = prep_country_data("Spain", "Not applicable", 0)
spain_conf_0 = spain_0[0]
spain_recov_0 = spain_0[1]
spain_deceas_0 = spain_0[2]
spain_conf_incr_0 = spain_0[3]
spain_act_0 = list(np.array(spain_conf_0) - \
np.array(spain_recov_0) - \
np.array(spain_deceas_0))
spain_conf_pos = spain_0[4]
spain_conf_0_perc = pop_perc(spain_conf_0, spain_pop)
# Calling the function extract_country to extract data related to Germany
germany_0 = prep_country_data("Germany", "Not applicable", 0)
germany_conf_0 = germany_0[0]
germany_recov_0 = germany_0[1]
germany_deceas_0 = germany_0[2]
germany_conf_incr_0 = germany_0[3]
germany_act_0 = list(np.array(germany_conf_0) - \
np.array(germany_recov_0) - \
np.array(germany_deceas_0))
germany_conf_pos = germany_0[4]
germany_conf_0_perc = pop_perc(germany_conf_0, germany_pop)
# Calling the function extract_country to extract data related to France
france_0 = prep_country_data("France", "Not applicable", 0)
france_conf_0 = france_0[0]
france_recov_0 = france_0[1]
france_deceas_0 = france_0[2]
france_conf_incr_0 = france_0[3]
france_act_0 = list(np.array(france_conf_0) - \
np.array(france_recov_0) - \
np.array(france_deceas_0))
france_conf_pos = france_0[4]
france_conf_0_perc = pop_perc(france_conf_0, france_pop)
# Calling the function extract_country to extract data related to Switzerland
switzerland_0 = prep_country_data("Switzerland", "Not applicable", 0)
switzerland_conf_0 = switzerland_0[0]
switzerland_recov_0 = switzerland_0[1]
switzerland_deceas_0 = switzerland_0[2]
switzerland_conf_incr_0 = switzerland_0[3]
switzerland_act_0 = list(np.array(switzerland_conf_0) - \
np.array(switzerland_recov_0) - \
np.array(switzerland_deceas_0))
switzerland_conf_pos = switzerland_0[4]
switzerland_conf_0_perc = pop_perc(switzerland_conf_0, switzerland_pop)
# Calling the function extract_country to extract data related to Netherlands
netherlands_0 = prep_country_data("Netherlands", "Not applicable", 0)
netherlands_conf_0 = netherlands_0[0]
netherlands_recov_0 = netherlands_0[1]
netherlands_deceas_0 = netherlands_0[2]
netherlands_conf_incr_0 = netherlands_0[3]
netherlands_act_0 = list(np.array(netherlands_conf_0) - \
np.array(netherlands_recov_0) - \
np.array(netherlands_deceas_0))
netherlands_conf_pos = netherlands_0[4]
netherlands_conf_0_perc = pop_perc(netherlands_conf_0, netherlands_pop)
# Calling the function extract_country to extract data related to Austria
austria_0 = prep_country_data("Austria", "Not applicable", 0)
austria_conf_0 = austria_0[0]
austria_recov_0 = austria_0[1]
austria_deceas_0 = austria_0[2]
austria_conf_incr_0 = austria_0[3]
austria_act_0 = list(np.array(austria_conf_0) - \
np.array(austria_recov_0) - \
np.array(austria_deceas_0))
austria_conf_pos = austria_0[4]
austria_conf_0_perc = pop_perc(austria_conf_0, austria_pop)
# Calling the function extract_country to extract data related to Belgium
belgium_0 = prep_country_data("Belgium", "Not applicable", 0)
belgium_conf_0 = belgium_0[0]
belgium_recov_0 = belgium_0[1]
belgium_deceas_0 = belgium_0[2]
belgium_conf_incr_0 = belgium_0[3]
belgium_act_0 = list(np.array(belgium_conf_0) - \
np.array(belgium_recov_0) - \
np.array(belgium_deceas_0))
belgium_conf_pos = belgium_0[4]
belgium_conf_0_perc = pop_perc(belgium_conf_0, belgium_pop)
# Calling the function extract_country to extract data related to Portugal
portugal_0 = prep_country_data("Portugal", "Not applicable", 0)
portugal_conf_0 = portugal_0[0]
portugal_recov_0 = portugal_0[1]
portugal_deceas_0 = portugal_0[2]
portugal_conf_incr_0 = portugal_0[3]
portugal_act_0 = list(np.array(portugal_conf_0) - \
np.array(portugal_recov_0) - \
np.array(portugal_deceas_0))
portugal_conf_pos = portugal_0[4]
portugal_conf_0_perc = pop_perc(portugal_conf_0, portugal_pop)
# Calling the function extract_country to extract data related to Luxembourg
luxembourg_0 = prep_country_data("Luxembourg", "Not applicable", 0)
luxembourg_conf_0 = luxembourg_0[0]
luxembourg_recov_0 = luxembourg_0[1]
luxembourg_deceas_0 = luxembourg_0[2]
luxembourg_conf_incr_0 = luxembourg_0[3]
luxembourg_act_0 = list(np.array(luxembourg_conf_0) - \
np.array(luxembourg_recov_0) - \
np.array(luxembourg_deceas_0))
luxembourg_conf_pos = luxembourg_0[4]
luxembourg_conf_0_perc = pop_perc(luxembourg_conf_0, luxembourg_pop)
# Calling the function prep_country_data to extract data related to UK
uk_0 = prep_country_data("United Kingdom", "Not applicable", 0)
uk_conf_0 = uk_0[0]
uk_recov_0 = uk_0[1]
uk_deceas_0 = uk_0[2]
uk_conf_incr_0 = uk_0[3]
uk_act_0 = list(np.array(uk_conf_0) - \
np.array(uk_recov_0) - \
np.array(uk_deceas_0))
uk_conf_pos = uk_0[4]
uk_conf_0_perc = pop_perc(uk_conf_0, uk_pop)
# Calling the function prep_country_data to extract data related to US
us_0 = prep_country_data("US", "Not applicable", 0)
us_conf_0 = us_0[0]
us_recov_0 = us_0[1]
us_deceas_0 = us_0[2]
us_conf_incr_0 = us_0[3]
us_act_0 = list(np.array(us_conf_0) - \
np.array(us_recov_0) - \
np.array(us_deceas_0))
us_conf_pos = us_0[4]
us_conf_0_perc = pop_perc(us_conf_0, us_pop)
# Daily Report from China broken by Provinces
daily_rep_short[daily_rep_short['Country_Region'] == "China"]
print("Number of entries related to China:")
len(daily_rep_short[daily_rep_short['Country_Region'] == "China"])
# Extracting data related to Hubei province by screning out the text variables
# and putting the result in list format
hubei_conf_0 = world_conf_short[(world_conf_short['Country/Region'] == 'China') & \
(world_conf_short['Province/State'] == 'Hubei')]
hubei_conf_0 = hubei_conf_0.iloc[:, 2:].values.tolist()[0]
hubei_conf_incr_0 = calc_increments(hubei_conf_0)
hubei_conf_0_perc = pop_perc(hubei_conf_0, hubei_pop)
hubei_recov_0 = world_recov_short[(world_recov_short['Country/Region'] == 'China') & \
(world_recov_short['Province/State'] == 'Hubei')]
hubei_recov_0 = hubei_recov_0.iloc[:, 2:].values.tolist()[0]
hubei_deceas_0 = world_deceas_short[(world_deceas_short['Country/Region'] == 'China') & \
(world_deceas_short['Province/State'] == 'Hubei')]
hubei_deceas_0 = hubei_deceas_0.iloc[:, 2:].values.tolist()[0]
hubei_act_0 = list(np.array(hubei_conf_0) - \
np.array(hubei_recov_0) - \
np.array(hubei_deceas_0))
# Extracting data related to all the other provinces, making the sum
# and putting the result in list format
restchina_conf_0 = world_conf_short[(world_conf_short['Country/Region'] == 'China') & \
(world_conf_short['Province/State'] != 'Hubei')]
restchina_conf_0 = restchina_conf_0.groupby(['Country/Region']).sum()
restchina_conf_0 = restchina_conf_0.values.tolist()[0]
restchina_conf_incr_0 = calc_increments(restchina_conf_0)
restchina_conf_0_perc = pop_perc(restchina_conf_0, restchina_pop)
restchina_recov_0 = world_recov_short[(world_recov_short['Country/Region'] == 'China') & \
(world_recov_short['Province/State'] != 'Hubei')]
restchina_recov_0 = restchina_recov_0.groupby(['Country/Region']).sum()
restchina_recov_0 = restchina_recov_0.values.tolist()[0]
restchina_deceas_0 = world_deceas_short[(world_deceas_short['Country/Region'] == 'China') & \
(world_deceas_short['Province/State'] != 'Hubei')]
restchina_deceas_0 = restchina_deceas_0.groupby(['Country/Region']).sum()
restchina_deceas_0 = restchina_deceas_0.values.tolist()[0]
restchina_act_0 = list(np.array(restchina_conf_0) - \
np.array(restchina_recov_0) - \
np.array(restchina_deceas_0))
Within this document, different datasets are used for different purposes. This section provides a summary as a useful reference and describes the naming rules that have been used. Those variables that have been created temporarily just for reason of code clarity are not included in this list.
world_conf_clean
world_recov_clean
world_deceas_clean
daily_rep_clean
world_conf_short, world_recov_short, world_deceas_short
world_conf, world_recov, world_deceas
world_conf_tot, world_recov_tot, world_deceas_tot
world_act_tot
world_conf_incr
daily_rep_short
world_conf_group, world_recov_group, world_deceas_group
daily_rep_group
days_tot
days_fin
country_conf_x, country_recov_x, country_deceas_x
country_act_x
country_conf_incr_x
country_conf_0_perc
country_conf_pos
The basic reproductive number, R0 is the average number of secondary infections generated by one infectious individual. When R0 > 1 the infection is able to spread. The aim of the non-pharmaceutical interventions (NPIs), as social distancing, is to reduce the value of R0.
The Case Fatality Ratio (CFR) is the proportion of detected cases of a given disease that die as a result of it.
Surveillance is typically biased towards detecting clinically severe cases, particularly at the start of an epidemic when diagnostic capacity is limited. This leads to an over estimation of the CFR.
On the other hand, there is a time interval (2/3 weeks) between the onset of symptoms and death or recovery. Therefore, measuring the simple ratio deceased/infected during a growing epidemic does not allow to observe the outcome of all the infected cases, leading to a under estimation of the CFR.
NOTE: The Infection Fatality Rate is the percentage of people that get the infection and then die. This number is much harder to estimate compared to the CFR since we do not know the total amount of people that have been really infected in a certain area.
The following curves are shown in the plots contained in this section:
The first four curves show the cumulative cases in a certain region since the start of the epidemic.
The cumulative confirmed cases curve is expected to grow exponentially and then slowly smoothing out towards a horizontal shape. Government decisions and people behavior can affect the way this curve looks like. The aim is to keep the curve not too steep in order not to saturate the capacity of the hospitals in the Country. However, it should be noted that the effects of Government and people actions are not immediate due to the incubation period.
The cumulative recovered cases curve follows the cumulative confirmed cases with a certain delay in time and a lower y value due to the amount of deceased cases.
The cumulative active cases are given by the confirmed cases minus the recovered cases minus the deceased cases. It is the only of the cumulative cases curves that can decrease over time and this happens when the number of confirmed cases grows slower than the combined number of recovered and deceased cases. This curve is expected to have a (upside down) bell shape.
The new confirmed daily cases show the speed at which the virus is spreading. This curve is expected to have a (upside down) bell shape. This curve shows the daily values and therefore is shows also some noise. Some of this noise might be due to mistakes in reporting the daily data (sometimes data of a certain day is reported together with the next day data). This kind of mistake does not affect the grand total and affects only very little the trend of the curves.
The new recovered daily cases curve looks similar to the new confirmed daily cases curve with a delay in time and lower y values.
The incremental daily active cases curve shows two picks of opposite sign. The x value where the negative curve starts corresponds to the pick of the corresponding cumulative curve.
NOTE: The number of the actual confirmed cases is very likely above the number of the counted confirmed cases since not all population is tested and there might be many infected persons showing no symptoms. However, by assuming a constant testing policy during the all observation period, the rate of changes is unaffected by systematic under-reporting and therefore there is a lot of useful information that can be obtained by those curves.
"The only real data we have is from the flights used by a number of Countries to repatriate their citizens. The all population was tested on those planes. If the population samples given by the passengers of those flights would be representative of the all population, we could conclude that the epidemic is at least 3 times larger compared to what the collected data shows."
Feb 12th, Prof. Neil Ferguson, https://www.imperial.ac.uk/people/neil.ferguson
"By comparing the number of flights that came into a certain Country from the worst affected area in China (Wuhan City) with the cases detected in that Country, it can be bound that the number of cases per flight varies quite a lot depending on the Country.
Singapore had a relatively high number of cases compared to other Countries. By using that data as a benchmark, that is, by assuming the Singapore has detected all the cases, the result is that worldwide approximately 2/3 of the cases have not been detected."
Professor Christl Donnelly, https://www.imperial.ac.uk/people/c.donnelly
The first complete curves are related to China. Let's analyze the curves related to China either than Hubei province. The curves can be divided in 4 phases.
1) Exponential increase phase
2) Linear increase phase
3) Slowed-down increase phase
4) No increase phase
Note that a new wave might follow (as it might happen in China outside Hubei).
Note that should the testing policy change during the observation period, the curve might look different.
Note also that an early release of containment measures might cause the curves to differ from this example and might lead to new picks before the active cases curves goes to zero.
# Plotting daily cumulative cases in the rest of China
cust_line_plot((days_tot, restchina_conf_0, ".", '-', 0, "confirmed cases"),
(days_tot, restchina_recov_0, ".", '-', 2, "recovered cases"),
(days_tot, restchina_deceas_0, ".", '-', 3, "deceased cases"),
(days_tot, restchina_act_0, ".", '-', 1, "active cases"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative cases in China "\
"either than Hubei over time",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
first_line_x='1/30',
first_line_x_l='End of the exponential increase phase',
second_line_x='2/5',
second_line_x_l='End of the linear increase phase',
third_line_x='2/22',
third_line_x_l='End of the slowed-down increase phase',
fourth_line_x='3/13',
fourth_line_x_l='End of the no increase phase',
special_line_x='2/11',
special_line_x_l='Herd Immunity')
# Plotting daily increments in confirmed cases in the rest of China
cust_bar_plot((days_tot, restchina_conf_incr_0, 0, "New daily confirmed cases"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 new daily confirmed cases in China "\
"either than Hubei",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
first_line_x='1/30',
first_line_x_l='End of the exponential increase phase',
second_line_x='2/5',
second_line_x_l='End of the linear increase phase',
third_line_x='2/22',
third_line_x_l='End of the slowed-down increase phase',
fourth_line_x='3/13',
fourth_line_x_l='End of the no increase phase',
special_line_x='2/11',
special_line_x_l='Herd Immunity')
# Plotting new daily deceased cases in the rest of China
cust_bar_plot((days_tot, calc_increments(restchina_deceas_0), 3,
"Daily deceased cases by COVID-19"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 new daily (reported) deceased cases "\
"in China either than Hubei",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0,
first_line_x='1/30',
first_line_x_l='End of the exponential increase phase',
second_line_x='2/5',
second_line_x_l='End of the linear increase phase',
third_line_x='2/22',
third_line_x_l='End of the slowed-down increase phase',
fourth_line_x='3/13',
fourth_line_x_l='End of the no increase phase',
special_line_x='2/11',
special_line_x_l='Herd Immunity')
# Plotting new daily recovered cases in the rest of China
cust_bar_plot((days_tot, calc_increments(restchina_recov_0), 2,
"Daily recovered cases by COVID-19"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 new daily (reported) recovered cases "\
"in China either than Hubei",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0,
first_line_x='1/30',
first_line_x_l='End of the exponential increase phase',
second_line_x='2/5',
second_line_x_l='End of the linear increase phase',
third_line_x='2/22',
third_line_x_l='End of the slowed-down increase phase',
fourth_line_x='3/13',
fourth_line_x_l='End of the no increase phase',
special_line_x='2/11',
special_line_x_l='Herd Immunity')
# Plotting daily increments in the active cases in the rest of China
cust_bar_plot((days_tot, calc_increments(restchina_act_0), 1,
"Daily increments in the active cases by COVID-19"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 increments in the daily active cases "\
"in China either than Hubei",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0,
first_line_x='1/30',
first_line_x_l='End of the exponential increase phase',
second_line_x='2/5',
second_line_x_l='End of the linear increase phase',
third_line_x='2/22',
third_line_x_l='End of the slowed-down increase phase',
fourth_line_x='3/13',
fourth_line_x_l='End of the no increase phase',
special_line_x='2/11',
special_line_x_l='Herd Immunity')
Comments to the plot below
The number of recovered cases on 4/2 does not match with official figures from Finland.
The increased speed in confirmed cases on 4/4 is due to change in testing policy.
# Plotting daily cumulative cases in Finland
cust_line_plot((days_fin, finland_conf_6, ".", '-', 0, "confirmed cases"),
(days_fin, finland_recov_6, ".", '-', 2, "recovered cases"),
(days_fin, finland_deceas_6, ".", '-', 3, "deceased cases"),
#(days_fin, finland_act_6, ".", '-', 1, "active cases"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative cases in Finland over time",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0,
first_line_x='3/12', first_line_x_l='First actions',
second_line_x='3/16', second_line_x_l='State of emergency declared',
third_line_x='3/28', third_line_x_l='Additional actions',
fourth_line_x='4/11', fourth_line_x_l='Tighter border control')
print("Concrete actions by the Finnish government")
measures.style.hide_index()
# Plotting new daily confirmed Coronavirus cases in Finland
cust_bar_plot((days_fin, finland_conf_incr_6, 0, "New daily confirmed cases"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 new daily confirmed cases in Finland",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0,
first_line_x='3/12', first_line_x_l='First actions',
second_line_x='3/16', second_line_x_l='State of emergency declared',
third_line_x='3/28', third_line_x_l='Additional actions',
fourth_line_x='4/11', fourth_line_x_l='Tighter border control')
Comment to the plot above
NOTE: The increased speed in confirmed cases on 4/4 is due to change in testing policy.
The data from 3/12 has been reported on 3/13.
# Plotting new daily deceased cases in Finland
cust_bar_plot((days_fin, calc_increments(finland_deceas_6), 3, ""),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 new daily deceased cases in Finland",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=False,
leg_fs=12,
legend_loc=0)
Comment to the plot above
Obviously, there is something wrong in the source data since it shows that the cumulative deaths on 4/6 are smaller than the cumulative deaths of 4/5.
Description of the plots of this section
It appears that the Finnish curve is quite smooth compared to the other curces. Only Iceland has a smoother curve. This would suggest that the virus is not spreading faster in Finland compared to mosy of the other Scandinavian Countries. By shifting all the curves so that they start for each Country in the day of the first confirmed case, the Finnish curve is the slowest to grow. See also the note below.
Even though the virus started later in Finland, the first recovered case happened much earlier than the other Scandinavian Countries. However, the biggest number of recovered cases is in Sweden.
Finland has also the lowest number of deceased cases (Sweden has the highest).
The high numbers for Sweden do not surprise due to the quite relaxed containment policy in the Country.
NOTE: It should be noted that the testing policy in each Country affects considerably the way the curve looks like. The less people you test, the better the curve looks like.
NOTE: The data from Denmark does not include Faroe Islands and Greenland.
# Comparing cumulative confirmed cases over time in Scandinavia
cust_line_plot((days_fin, finland_conf_6, ".", '-', 0, "Finland"),
(days_fin, denmark_conf_6, ".", '-', 3, "Denmark"),
(days_fin, norway_conf_6, ".", '-', 6, "Norway"),
(days_fin, sweden_conf_6, ".", '-', 8, "Sweden"),
(days_fin, iceland_conf_6, ".", '-', 4, "Iceland"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative confirmed cases in "\
"Scandinavia over time",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
# Comparing cumulative confirmed cases over time in Scandinavia
# starting form the day of the first confirmed case
cust_line_plot((list(range(len(finland_conf_pos))), finland_conf_pos,
".", '-', 0, "Finland"),
(list(range(len(denmark_conf_pos))), denmark_conf_pos,
".", '-', 3, "Denmark"),
(list(range(len(norway_conf_pos))), norway_conf_pos,
".", '-', 6, "Norway"),
(list(range(len(sweden_conf_pos))), sweden_conf_pos,
".", '-', 8, "Sweden"),
(list(range(len(iceland_conf_pos))), iceland_conf_pos,
".", '-', 4, "Iceland"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative confirmed cases "\
"in Scandinavia over time",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="Days since the first confirmed case in the Country",
rot=0,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
# Comparing new daily confirmed Coronavirus cases in Scandinavia
cust_line_plot((days_fin, finland_conf_incr_6, ".", '-', 0, "Finland"),
(days_fin, denmark_conf_incr_6, ".", '-', 3, "Denmark"),
(days_fin, norway_conf_incr_6, ".", '-', 6, "Norway"),
(days_fin, sweden_conf_incr_6, ".", '-', 8, "Sweden"),
(days_fin, iceland_conf_incr_6, ".", '-', 4, "Iceland"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 new daily confirmed cases "\
"in Scandinavia",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
Comments to the next two plots:
Data related to Iceland is corrupted (cumulative data cannot decrease) so the related plot is not shown.
# Comparing cumulative recovered cases over time in Scandinavia
plot_stacked_bar(days_fin,
[finland_recov_6, denmark_recov_6, norway_recov_6, sweden_recov_6],
["Finland", "Denmark", "Norway", "Sweden"],
col=[0, 3, 6, 8],
multidim=True, figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative (reported) recovered cases in "\
"Scandinavia over time",
title_fs=18,
frame=False,
category_labels=days_tot,
label_fs = 12, ticks_fs=12,
x_label="month/day", rot=90,
y_label="Total of cases in all the Countries",
legend=True, legend_loc = 2, legend_fs=12,
add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10)
# Comparing cumulative deceased cases over time in Scandinavia
plot_stacked_bar(days_fin,
[finland_deceas_6, denmark_deceas_6, norway_deceas_6, sweden_deceas_6],
["Finland", "Denmark", "Norway", "Sweden"],
col=[0, 3, 6, 8],
multidim=True, figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative (reported) deceased cases in "\
"Scandinavia over time",
title_fs=18,
frame=False,
category_labels=days_tot,
label_fs = 12, ticks_fs=12,
x_label="month/day", rot=90,
y_label="Total of cases in all the Countries",
legend=True, legend_loc = 2, legend_fs=12,
add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10)
Comment to the plots in this section
Finland has also the lowest curves compared to other European Countries.
The plots related to the new confirmed cases show the same pattern for all those Countries. This might be due to the fact that those plots are very much dependent on how many people are tested in a certain day.
NOTE: When comparing those curves please note also that the testing policy in each Country affects considerably the way the curve looks like. The less people you test, the better the curve looks like.
NOTE: The data from France and Netherlands does not include offshore territories.
# Comparing cumulative confirmed cases over time for Finland,
# Italy, Spain, Germany, France and Switzerland
cust_line_plot((days_tot, finland_conf_0, ".", '-', 0, "Finland"),
(days_tot, italy_conf_0, ".", '-', 2, "Italy"),
(days_tot, spain_conf_0, ".", '-', 1, "Spain"),
(days_tot, germany_conf_0, ".", '-', 4, "Germany"),
(days_tot, france_conf_0, ".", '-', 3, "France"),
(days_tot, switzerland_conf_0, ".", '-', 6, "Switzerland"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative confirmed cases "\
"in Finland compared to \nItaly, Spain, Germany, France "\
"and Switzerland",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
# Comparing new daily confirmed cases over time for Finland,
# Italy, Spain, Germany, France and Switzerland
cust_line_plot((days_tot, finland_conf_incr_0, ".", '-', 0, "Finland"),
(days_tot, italy_conf_incr_0, ".", '-', 2, "Italy"),
(days_tot, spain_conf_incr_0, ".", '-', 1, "Spain"),
(days_tot, germany_conf_incr_0, ".", '-', 4, "Germany"),
(days_tot, france_conf_incr_0, ".", '-', 3, "France"),
(days_tot, switzerland_conf_incr_0, ".", '-', 6, "Switzerland"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 new daily confirmed cases "\
"in Finland compared to \nItaly, Spain, Germany, France "\
"and Switzerland",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
# Comparing cumulative confirmed cases over time for Finland,
# Netherlands, Austria, Belgium, Portugal and Luxembourg
cust_line_plot((days_tot, finland_conf_0, ".", '-', 0, "Finland"),
(days_tot, netherlands_conf_0, ".", '-', 4, "Netherlands"),
(days_tot, austria_conf_0, ".", '-', 3, "Austria"),
(days_tot, belgium_conf_0, ".", '-', 8, "Belgium"),
(days_tot, portugal_conf_0, ".", '-', 2, "Portugal"),
(days_tot, luxembourg_conf_0, ".", '-', 9, "Luxembourg"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative confirmed cases "\
"in Finland compared to \nNetherlands, Austria, "\
"Belgium, Portugal and Luxembourg",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
# Comparing new daily confirmed cases for Finland,
# Switzerland, Netherlands, Austria and Belgium
cust_line_plot((days_tot, finland_conf_incr_0, ".", '-', 0, "Finland"),
(days_tot, netherlands_conf_incr_0, ".", '-', 4, "Netherlands"),
(days_tot, austria_conf_incr_0, ".", '-', 3, "Austria"),
(days_tot, belgium_conf_incr_0, ".", '-', 8, "Belgium"),
(days_tot, portugal_conf_incr_0, ".", '-', 2, "Portugal"),
(days_tot, luxembourg_conf_incr_0, ".", '-', 9, "Luxembourg"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 new daily confirmed cases "\
"in Finland compared to \nNetherlands, Austria, "\
"Belgium, Portugal and Luxembourg",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
Comment to the plots in this section
UK and US have followed quite relaxed policies in containing the spread of the virus during the first days.
NOTE: The data from UK does not include the Isle of Man and the Channel Islands.
# Comparing cumulative confirmed Coronavirus cases in Finland, UK and US
cust_line_plot((days_tot, finland_conf_0, ".", '-', 0, "Finland"),
(days_tot, uk_conf_0, ".", '-', 4, "UK"),
(days_tot, us_conf_0, ".", '-', 3, "US"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative confirmed cases "\
"in Finland, UK and US",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
# Comparing new daily confirmed cases Coronavirus cases in Finland, UK and US
cust_line_plot((days_tot, finland_conf_incr_0, ".", '-', 0, "Finland"),
(days_tot, uk_conf_incr_0, ".", '-', 4, "UK"),
(days_tot, us_conf_incr_0, ".", '-', 3, "US"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 new daily confirmed cases "\
"in Finland, UK and US",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
Comments to the plots in this section
The curves related to the cumulative confirmed cases seem to have the same shape. The main difference seems to be the height.
The height of those curves can differ for different reasons, including:
It might be interesting to isolate the first variable, Country population, by dividing the values by the Country population in order to calculate the amount of cases per capita. The result is shown in the first plot. The plot shows that the other variables still can affect the curve as much as 10 times.
NOTE: The Country population figures are approximative.
print("The range of the median age in the Countries that are analyzed here is: "\
"{:.1f} years"\
.format(median_age_range))
# Comparing cumulative confirmed cases over time for Finland,
# and other Scandinavian Countries in percentage of the Country population
cust_line_plot((days_tot, finland_conf_0_perc, ".", '-', 0, "Finland"),
(days_tot, denmark_conf_0_perc, ".", '-', 3, "Denmark"),
(days_tot, norway_conf_0_perc, ".", '-', 6, "Norway"),
(days_tot, sweden_conf_0_perc, ".", '-', 8, "Sweden"),
(days_tot, iceland_conf_0_perc, ".", '-', 4, "Iceland"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative confirmed cases "\
"in Finland and other Scandinavian Countries \n"\
"in percentage of each Country population",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
Comments to the plot above
Finland has the lowest number of confirmed cases per capita. Iceland has the highest number.
# Comparing cumulative confirmed cases over time for Finland,
# and other European Countries in percentage of the Country population
cust_line_plot((days_tot, finland_conf_0_perc, ".", '-', 0, "Finland"),
(days_tot, italy_conf_0_perc, ".", '-', 2, "Italy"),
(days_tot, spain_conf_0_perc, ".", '-', 1, "Spain"),
(days_tot, germany_conf_0_perc, ".", '-', 4, "Germany"),
(days_tot, france_conf_0_perc, ".", '-', 3, "France"),
(days_tot, switzerland_conf_0_perc, ".", '-', 6, "Switzerland"),
(days_tot, luxembourg_conf_0_perc, ".", '-', 9, "Luxembourg"),
(days_tot, belgium_conf_0_perc, ".", '-', 8, "Belgium"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative confirmed cases "\
"in Finland and other European Countries \n"\
"in percentage of each Country population",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
Comments to the plot above
Luxembourg has the a higher number of confirmed cases per capita than Iceland.
# Comparing cumulative confirmed cases over time for Finland,
# and other European Countries + UK & US in percentage of the Country population
cust_line_plot((days_tot, finland_conf_0_perc, ".", '-', 0, "Finland"),
(days_tot, netherlands_conf_0_perc, ".", '-', 4, "Netherlands"),
(days_tot, austria_conf_0_perc, ".", '-', 3, "Austria"),
(days_tot, portugal_conf_0_perc, ".", '-', 2, "Portugal"),
(days_tot, uk_conf_0_perc, ".", '-', 6, "UK"),
(days_tot, us_conf_0_perc, ".", '-', 5, "US"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative confirmed cases "\
"in Finland and other European Countries + UK & US \n"\
"in percentage of each Country population",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
Comments to the plot
Note that one of the reasons why UK and Finland curves started pretty low might be due to the fact that they are quite isolated geographically and therefore the virus started to spread later.
However, those curves clearly show that in Countries that have not taken prompt containment actions, as UK and US, those curves started to take a steeper shape.
In the attempt to eliminate the variability due to different testing policies in different Countries, let's draw similar plots by taking the deceased cases rather than the cumulative confirmed cases as a reference curve.
# Comparing cumulative deceased cases over time for Finland,
# and other Scandinavian Countries in percentage of the Country population
cust_line_plot((days_tot, pop_perc(finland_deceas_0, finland_pop),
".", '-', 0, "Finland"),
(days_tot, pop_perc(denmark_deceas_0, denmark_pop),
".", '-', 3, "Denmark"),
(days_tot, pop_perc(norway_deceas_0, norway_pop),
".", '-', 6, "Norway"),
(days_tot, pop_perc(sweden_deceas_0, sweden_pop),
".", '-', 8, "Sweden"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative deceased cases "\
"in Finland and other Scandinavian Countries \n"\
"in percentage of each Country population",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
# Comparing cumulative deceased cases over time for Finland,
# and other European Countries in percentage of the Country population
cust_line_plot((days_tot, pop_perc(finland_deceas_0, finland_pop),
".", '-', 0, "Finland"),
(days_tot, pop_perc(italy_deceas_0, italy_pop),
".", '-', 2, "Italy"),
(days_tot, pop_perc(spain_deceas_0, spain_pop),
".", '-', 1, "Spain"),
(days_tot, pop_perc(germany_deceas_0, germany_pop),
".", '-', 4, "Germany"),
(days_tot, pop_perc(france_deceas_0, france_pop),
".", '-', 3, "France"),
(days_tot, pop_perc(switzerland_deceas_0, switzerland_pop),
".", '-', 6, "Switzerland"),
(days_tot, pop_perc(luxembourg_deceas_0, luxembourg_pop),
".", '-', 9, "Luxembourg"),
(days_tot, pop_perc(belgium_deceas_0, belgium_pop),
".", '-', 8, "Belgium"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative deceased cases "\
"in Finland and other European Countries \n"\
"in percentage of each Country population",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
# Comparing cumulative deceased cases over time for Finland,
# and other European Countries + UK & US in percentage of the Country population
cust_line_plot((days_tot, pop_perc(finland_deceas_0, finland_pop),
".", '-', 0, "Finland"),
(days_tot, pop_perc(netherlands_deceas_0, netherlands_pop), ".", '-', 4,
"Netherlands"),
(days_tot, pop_perc(austria_deceas_0, austria_pop), ".", '-', 3,
"Austria"),
(days_tot, pop_perc(portugal_deceas_0, portugal_pop), ".", '-', 2,
"Portugal"),
(days_tot, pop_perc(uk_deceas_0, uk_pop), ".", '-', 6,
"UK"),
(days_tot, pop_perc(us_deceas_0, us_pop), ".", '-', 5,
"US"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative deceased cases "\
"in Finland and other European Countries + UK & US \n"\
"in percentage of each Country population",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
By dividing the cumulative confirmed cases by the population and by the population density, we try to eliminate the effect of those two variables. By assuming that the pollution level can be correlated to the population density (which is not necessarily true) and by ignoring the possible effects of the median age, genetics, climate (latitude) and possible virus mutations, the resulting curve might give an indication of the effect of the containment policies and of the people behavior. Those assumptions might be reasonable when comparing the Scandinavian Countries.
NOTE: It shall be remembered that the testing policy of different Countries might differ and therefore the cumulative confirmed cases are not necessarily a good statistical representation of the actual number of infections. Also, this approach assumes uniform distribution of the population in the all territory. Countries like Iceland (or Norway) where population is concentrated in few locations are too penalized by this normalization.
The result shows overlapping curves for Finland and Sweden (even though at the beginning Sweden has followed more relaxed containment measures). Iceland curve is an outlier.
By looking into European Countries, only Spain shows higher curves than Finland, while Belgium and Germany show pretty low and close curves.
Another curve that is higher than Finland is US.
# Comparing cumulative confirmed cases over time for Finland,
# and other Scandinavian Countries in percentage of the Country population
# and normalized by the density of population
cust_line_plot((days_tot, finland_conf_0_perc/finland_dens, ".", '-', 0, "Finland"),
(days_tot, denmark_conf_0_perc/denmark_dens, ".", '-', 3, "Denmark"),
(days_tot, norway_conf_0_perc/norway_dens, ".", '-', 6, "Norway"),
(days_tot, sweden_conf_0_perc/sweden_dens, ".", '-', 8, "Sweden"),
#(days_tot, iceland_conf_0_perc/iceland_dens, ".", '-', 4, "Iceland"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative confirmed cases "\
"in Finland and other Scandinavian Countries \n"\
"in percentage of each Country population "\
"and normalized by the density of population",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
# Comparing cumulative confirmed cases over time for Finland,
# and other European Countries in percentage of the Country population
# and normalized by the density of population
cust_line_plot((days_tot, finland_conf_0_perc/finland_dens, ".", '-', 0, "Finland"),
(days_tot, italy_conf_0_perc/italy_dens, ".", '-', 2, "Italy"),
(days_tot, spain_conf_0_perc/spain_dens, ".", '-', 1, "Spain"),
(days_tot, germany_conf_0_perc/germany_dens, ".", '-', 4, "Germany"),
(days_tot, france_conf_0_perc/france_dens, ".", '-', 3, "France"),
(days_tot, switzerland_conf_0_perc/switzerland_dens, ".", '-', 6,
"Switzerland"),
(days_tot, luxembourg_conf_0_perc/luxembourg_dens, ".", '-', 9,
"Luxembourg"),
(days_tot, belgium_conf_0_perc/belgium_dens, ".", '-', 8, "Belgium"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative confirmed cases "\
"in Finland and other European Countries \n"\
"in percentage of each Country population "\
"and normalized by the density of population",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
# Comparing cumulative confirmed cases over time for Finland,
# and other European Countries + UK & US in percentage of the Country population
# and normalized by the density of population
cust_line_plot((days_tot, finland_conf_0_perc/finland_dens, ".", '-', 0, "Finland"),
(days_tot, netherlands_conf_0_perc/netherlands_dens, ".", '-', 4,
"Netherlands"),
(days_tot, austria_conf_0_perc/austria_dens, ".", '-', 3, "Austria"),
(days_tot, portugal_conf_0_perc/portugal_dens, ".", '-', 2, "Portugal"),
(days_tot, uk_conf_0_perc/uk_dens, ".", '-', 6, "UK"),
(days_tot, us_conf_0_perc/us_dens, ".", '-', 5, "US"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative confirmed cases "\
"in Finland and other European Countries + UK & US \n"\
"in percentage of each Country population "\
"and normalized by the density of population",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
In the attempt to eliminate the variability due to different testing policies in different Countries, let's draw similar plots by taking the deceased cases rather than the cumulative confirmed cases as a reference curve.
The result is that Finland has the lowest curve and Sweden has the highest curve in Scandinavia. In Europe, France. Italy and Spain have higher curves than Finland and Germany has the lowest curve. Also US curve is higher than Finland.
# Comparing cumulative deceased cases over time for Finland,
# and other Scandinavian Countries in percentage of the Country population
# and normalized by the density of population
cust_line_plot((days_tot, pop_perc(finland_deceas_0, finland_pop)/finland_dens,
".", '-', 0, "Finland"),
(days_tot, pop_perc(denmark_deceas_0, denmark_pop)/finland_dens,
".", '-', 3, "Denmark"),
(days_tot, pop_perc(norway_deceas_0, norway_pop)/finland_dens,
".", '-', 6, "Norway"),
(days_tot, pop_perc(sweden_deceas_0, sweden_pop)/finland_dens,
".", '-', 8, "Sweden"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative deceased cases "\
"in Finland and other Scandinavian Countries \n"\
"in percentage of each Country population "\
"and normalized by the density of population",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
# Comparing cumulative deceased cases over time for Finland,
# and other European Countries in percentage of the Country population
# and normalized by the density of population
cust_line_plot((days_tot, pop_perc(finland_deceas_0, finland_pop)/finland_dens,
".", '-', 0, "Finland"),
(days_tot, pop_perc(italy_deceas_0, italy_pop)/italy_dens,
".", '-', 2, "Italy"),
(days_tot, pop_perc(spain_deceas_0, spain_pop)/spain_dens,
".", '-', 1, "Spain"),
(days_tot, pop_perc(germany_deceas_0, germany_pop)/germany_dens,
".", '-', 4, "Germany"),
(days_tot, pop_perc(france_deceas_0, france_pop)/france_dens,
".", '-', 3, "France"),
(days_tot,
pop_perc(switzerland_deceas_0, switzerland_pop)/switzerland_dens,
".", '-', 6,
"Switzerland"),
(days_tot,
pop_perc(luxembourg_deceas_0, luxembourg_pop)/luxembourg_dens,
".", '-', 9,
"Luxembourg"),
(days_tot, pop_perc(belgium_deceas_0, belgium_pop)/belgium_dens,
".", '-', 8, "Belgium"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative deceased cases "\
"in Finland and other European Countries \n"\
"in percentage of each Country population "\
"and normalized by the density of population",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
# Comparing cumulative deceased cases over time for Finland,
# and other European Countries + UK & US in percentage of the Country population
# and normalized by the density of population
cust_line_plot((days_tot, pop_perc(finland_deceas_0, finland_pop)/finland_dens,
".", '-', 0, "Finland"),
(days_tot,
pop_perc(netherlands_deceas_0, netherlands_pop)/netherlands_dens,
".", '-', 4,
"Netherlands"),
(days_tot, pop_perc(austria_deceas_0, austria_pop)/austria_dens,
".", '-', 3, "Austria"),
(days_tot, pop_perc(portugal_deceas_0, portugal_pop)/portugal_dens,
".", '-', 2, "Portugal"),
(days_tot, pop_perc(uk_deceas_0, uk_pop)/uk_dens,
".", '-', 6, "UK"),
(days_tot, pop_perc(us_deceas_0, us_pop)/us_dens,
".", '-', 5, "US"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative deceased cases "\
"in Finland and other European Countries + UK & US \n"\
"in percentage of each Country population "\
"and normalized by the density of population",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
Comments to the plot
The first plot shows that the shape of the cumulative curves for the two areas in the Country are very similar. The main visible difference is the amplitude of the curves.
It happears that the number of confirmed cases in China outside Hubei province shows a tendency to grow again.
# Plotting daily cumulative cases in Hubei
plot_stacked_bar(days_tot,
[hubei_deceas_0, hubei_recov_0, hubei_act_0],
["deceased cases", "recovered cases", "active cases"],
col=[3, 2, 1],
multidim=True, figsize_w=18, figsize_h=12,
title="COVID-19 cumulative cases in Hubei (China) over time",
title_fs=18,
frame=False,
category_labels=days_tot,
label_fs = 12, ticks_fs=12,
x_label="month/day", rot=90,
y_label="confirmed cases",
legend=True, legend_loc = 2, legend_fs=12,
add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10)
# Plotting daily cumulative cases in Hubei
cust_line_plot((days_tot, hubei_conf_0, ".", '-', 0, "confirmed cases"),
(days_tot, hubei_recov_0, ".", '-', 2, "recovered cases"),
(days_tot, hubei_deceas_0, ".", '-', 3, "deceased cases"),
(days_tot, hubei_act_0, ".", '-', 1, "active cases"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative cases in Hubei (China) over time",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
# Plotting daily increments in confirmed cases in Hubei province in China
cust_bar_plot((days_tot, hubei_conf_incr_0, 0, ""),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 new daily confirmed cases "\
"in Hubei province (China)",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=False,
leg_fs=12,
legend_loc=0)
Comment to the plot above
The data from 2/12 has been reported on 2/13.
# Plotting daily cumulative cases in the rest of China
plot_stacked_bar(days_tot,
[restchina_deceas_0, restchina_recov_0, restchina_act_0],
["deceased cases", "recovered cases", "active cases"],
col=[3, 2, 1],
multidim=True, figsize_w=18, figsize_h=12,
title="COVID-19 cumulative cases in China either than Hubei over time",
title_fs=18,
frame=False,
category_labels=days_tot,
label_fs = 12, ticks_fs=12,
x_label="month/day", rot=90,
y_label="confirmed cases",
legend=True, legend_loc = 2, legend_fs=12,
add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10)
# Plotting daily cumulative cases in the rest of China
cust_line_plot((days_tot, restchina_conf_0, ".", '-', 0, "confirmed cases"),
(days_tot, restchina_recov_0, ".", '-', 2, "recovered cases"),
(days_tot, restchina_deceas_0, ".", '-', 3, "deceased cases"),
(days_tot, restchina_act_0, ".", '-', 1, "active cases"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative cases in China "\
"either than Hubei over time",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
# Plotting daily increments in confirmed cases in the rest of China
cust_bar_plot((days_tot, restchina_conf_incr_0, 0, ""),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 new daily confirmed cases in China "\
"either than Hubei",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=False,
leg_fs=12,
legend_loc=0)
# Plotting daily cumulative cases in Italy
cust_line_plot((days_tot, italy_conf_0, ".", '-', 0, "confirmed cases"),
(days_tot, italy_recov_0, ".", '-', 2, "recovered cases"),
(days_tot, italy_deceas_0, ".", '-', 3, "deceased cases"),
(days_tot, italy_act_0, ".", '-', 1, "active cases"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative cases in Italy over time",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
# Plotting daily cumulative cases in Italy
plot_stacked_bar(days_tot,
[italy_deceas_0, italy_recov_0, italy_act_0],
["deceased cases", "recovered cases", "active cases"],
col=[3, 2, 1],
multidim=True, figsize_w=18, figsize_h=12,
title="COVID-19 cumulative cases in Italy over time",
title_fs=18,
frame=False,
category_labels=days_tot,
label_fs = 12, ticks_fs=12,
x_label="month/day", rot=90,
y_label="confirmed cases",
legend=True, legend_loc = 2, legend_fs=12,
add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10)
# Plotting new daily confirmed Coronavirus cases in Italy
cust_bar_plot((days_tot, italy_conf_incr_0, 0, ""),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 new daily confirmed cases in Italy",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=False,
leg_fs=12,
legend_loc=0)
# Plotting increments in the active cases in Italy
cust_bar_plot((days_tot, calc_increments(italy_act_0), 1, ""),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 increments in the active cases in Italy",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=False,
leg_fs=12,
legend_loc=0)
Comment to the plot above
The data from 3/12 has been reported on 3/13.
# Plotting new daily deceased cases in Italy
cust_bar_plot((days_tot, calc_increments(italy_deceas_0), 3, ""),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 new daily deceased cases in Italy",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=False,
leg_fs=12,
legend_loc=0)
Comment to the plots
March 30th.
It is interesting to compare the following two plots to the Chinese curves in order to get a grasp of how long the first wave of emergency might last in the all world. In order to do this, it is important to note that the Chinese curves do not start from a zero cumulative value. In other words, the Chinese curves are missing at least a good full month. Therefore, we can only compare the Chinese curves to the second half of the world curves.
It looks that the most recent half of the world curves corresponds to less than 1/4 of the expected length of the curve and it took a month to go through that part of the curve. So, everything else being equal, the first world-wide wave might last at least other 3 months. However, it shall be noted that there might be many variables affecting the development of the world curves and the value of those variables might differ from the Chinese ones.
# Plotting daily cumulative cases in the all world
cust_line_plot((days_tot, world_conf_tot, ".", '-', 0, "confirmed cases"),
(days_tot, world_recov_tot, ".", '-', 2, "recovered cases"),
(days_tot, world_deceas_tot, ".", '-', 3, "deceased cases"),
(days_tot, world_act_tot, ".", '-', 1, "active cases"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 cumulative cases in the all world "\
"over time",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0)
# Plotting new daily cases in the all world
cust_bar_plot((days_tot, world_conf_incr, 0, ""),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 new daily confirmed cases in the all world",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=False,
leg_fs=12,
legend_loc=0)
# Plotting increments in the active cases in the all world
cust_bar_plot((days_tot, calc_increments(world_act_tot), 1, ""),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 increments in the active cases "\
"in the all world",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=False,
leg_fs=12,
legend_loc=0)
Comment to the plot below
The estimated average daily number of deaths due to other reasons has been added with the only scope of putting the numbers into context.
It shall be noted that at the beginning of the curve the deaths by COVID-19 grow in an exponential way and therefore the area plotted in red might easily become larger than the areas under the horizontal lines (which means that the cumulative deaths by COVID-19 would be higher than the cumulative deaths by the corresponding cause represented by the horizontal line).
Also, the deaths by COVID-19 might be under estimated due to the fact that not all the population is tested.
On the other hand, the average deaths by seasonal flu in year 2020 might be less than normal due to the high hand hygiene that has been introduced due to the novel Coronavirus. Similarly, the deaths due to road traffic accidents might be slight less than expected due to the reduce mobility of people due to containment measures.
Finally, let's remember that the R0 for the Coronavirus is estimated to be at least double compared to common flu (meaning it spreads faster) and the CFR (that measures the mortality) is estimated to be one order of magnitude (that is, 10 times) bigger (there is a lot of uncertainity about this last value).
# Plotting new daily deceased cases in the all world
cust_bar_plot((days_tot, calc_increments(world_deceas_tot), 3,
"Daily deceased cases by COVID-19"),
figsize_w=18, figsize_h=12,
title="Coronavirus COVID-19 new daily (reported) deceased cases "\
"in the all world",
title_fs=18, title_offset=20,
rem_borders=True,
label_fs=12, tick_fs=12,
x_label="month/day",
rot=90,
y_label=None,
legend=True,
leg_fs=12,
legend_loc=0,
first_line_y=1288,
first_line_y_l="Average daily deaths by seasonal flu",
second_line_y=2192,
second_line_y_l="Average daily deaths by suicides",
third_line_y=3561,
third_line_y_l="Average daily estimated number of deaths "\
"by road traffic accidents")
# sources for the additional info:
# https://www.worldometers.info/
# https://www.who.int/mental_health/prevention/suicide/suicideprevent/en/
# https://www.who.int/mediacentre/events/meetings/2011/road_safety/en/
print("Reported deaths by COVID-19 so far this year:",
world_deceas_tot.iloc[-1])
print("Estimated deaths by seasonal flu so far this year:",
1288*(len(days_tot)+21))
print("Estimated deaths by suicides so far this year:",
2192*(len(days_tot)+21))
print("Estimated deaths by road traffic accidents so far this year:",
3561*(len(days_tot)+21))
# Reordering the columns
daily_rep_group = daily_rep_group.reindex(columns=['Confirmed',
'Recovered',
'Deaths',
'Active'])
print("Grand Total Worldwide:\n")
print(daily_rep_group.sum().to_string())
# Mortality (worldwide)
mort = (daily_rep_group.sum()[2]/daily_rep_group.sum()[0])*100
print("'Calculated' mortality worldwide: {:.2f}\n".format(mort))
print("IMPORTANT NOTE:\nThe actual mortality could be much lower",
"due to the fact that not all infected people\nhave been tested!\n"
"On the other hand, the counted deaths are due to infections that happened",
"weeks ago.\nThis means that, as long as the contagius cases increase, "
"the calculated mortality\nis under-estimated.")
# The top 10 Countries by number of confirmed cases in descending order
conf_top_10 = daily_rep_group.sort_values(by ='Confirmed', ascending = False).\
head(10)['Confirmed']
# Showing the top 10 Countries by number of confirmed cases in a bar plot
plot_cust_hbar(conf_top_10.sort_values(),
figsize_w=16, figsize_h=12,
frame=False, grid=False,
ref_font_size=14,
title_text="Countries by number of confirmed cases "\
"in descending order (top 10)",
title_offset=20,
color_numb=0,
categ_labels=True,
labels=None,
rot=0,
show_values=True,
omitted_value=0,
percent=False,
center_al=True,
visible_digits=2)
# The top 10 Countries by number of recovered cases in descending order
recov_top_10 = daily_rep_group.sort_values(by ='Recovered', ascending = False).\
head(10)['Recovered']
# Showing the top 10 Countries by number of recovered cases in a bar plot
plot_cust_hbar(recov_top_10.sort_values(),
figsize_w=16, figsize_h=12,
frame=False, grid=False,
ref_font_size=14,
title_text="Countries by number of recovered cases "\
"in descending order (top 10)",
title_offset=20,
color_numb=2,
categ_labels=True,
labels=None,
rot=0,
show_values=True,
omitted_value=0,
percent=False,
center_al=True,
visible_digits=2)
# The top 10 Countries by number of deceased cases in descending order
deceas_top_10 = daily_rep_group.sort_values(by ='Deaths', ascending = False).\
head(10)['Deaths']
# Showing the top 10 Countries by number of deceased cases in a bar plot
plot_cust_hbar(deceas_top_10.sort_values(),
figsize_w=16, figsize_h=12,
frame=False, grid=False,
ref_font_size=14,
title_text="Countries by number of deceased cases "\
"in descending order (top 10)",
title_offset=20,
color_numb=3,
categ_labels=True,
labels=None,
rot=0,
show_values=True,
omitted_value=0,
percent=False,
center_al=True,
visible_digits=2)
# Fixing the current number of active cases in US
# (due to mistake in source data)
daily_rep_group.at['US', 'Active'] = \
daily_rep_group.loc['US', ['Confirmed']][0] - \
daily_rep_group.loc['US', ['Recovered']][0] - \
daily_rep_group.loc['US', ['Deaths']][0]
# The top 10 Countries by number of active cases in descending order
act_top_10 = daily_rep_group.sort_values(by ='Active', ascending = False).\
head(10)['Active']
# Showing the top 10 Countries by number of active cases in a bar plot
plot_cust_hbar(act_top_10.sort_values(),
figsize_w=16, figsize_h=12,
frame=False, grid=False,
ref_font_size=14,
title_text="Countries by number of active cases "\
"in descending order (top 10)",
title_offset=20,
color_numb=1,
categ_labels=True,
labels=None,
rot=0,
show_values=True,
omitted_value=0,
percent=False,
center_al=False,
visible_digits=2)
print("\n(*) Note that for certain Countries the figures in the previous three tables",
"contain also off shore territories.")
print("For example, for France the numbers include:\n\n",
"- French Polynesia\n",
"- New caledonia\n",
"- St Martina\n",
"- Saint Barthelemyia\n",
"- French Guiana\n",
"- Guadelupe\n",
"- Mayotte\n",
"- Reunion\n")
# Visualizing the current status in Finland
print("Latest situation in Finland:\n")
print(daily_rep_group.loc['Finland'].to_string())
The plots are shortly described in section 7.
Due to the need of daily updates, no additional comment is added here.
Many thanks to Johns Hokpins University for sharing and maintaining daily the source csv files.
Many thanks to Coursera for providing a very informative course.
Many thanks to colleagues and friends who have contributed by providing links and comments.
print("Last plotted day:", dt.datetime.strptime(last_day, "%m-%d-%Y").\
date().strftime("%d-%b-%Y"))
Used software:
- Jupyter Notebook server 6.0.1
- Python 3.6.8
- numpy 1.18.2
- pandas 1.0.3
- matplotlib 3.1.2
- seaborn 0.9.0
- regex 2019.8.19
on top of Linux Ubuntu 18.04